Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Rule-based Search in Text Databases with Nonstandard Orthography

Identifieur interne : 001016 ( Main/Exploration ); précédent : 001015; suivant : 001017

Rule-based Search in Text Databases with Nonstandard Orthography

Auteurs : Thomas Pilz [Allemagne] ; Wolfram Luther [Allemagne] ; Norbert Fuhr [Allemagne] ; Ulrich Ammon [Allemagne] ; Ulrich Ammon [Allemagne]

Source :

RBID : ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65

Abstract

In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).

Url:
DOI: 10.1093/llc/fql020


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Rule-based Search in Text Databases with Nonstandard Orthography</title>
<author>
<name sortKey="Pilz, Thomas" sort="Pilz, Thomas" uniqKey="Pilz T" first="Thomas" last="Pilz">Thomas Pilz</name>
</author>
<author>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
</author>
<author>
<name sortKey="Fuhr, Norbert" sort="Fuhr, Norbert" uniqKey="Fuhr N" first="Norbert" last="Fuhr">Norbert Fuhr</name>
</author>
<author>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
</author>
<author wicri:is="90%">
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1093/llc/fql020</idno>
<idno type="url">https://api.istex.fr/document/623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000389</idno>
<idno type="wicri:Area/Istex/Curation">000383</idno>
<idno type="wicri:Area/Istex/Checkpoint">000989</idno>
<idno type="wicri:doubleKey">0268-1145:2006:Pilz T:rule:based:search</idno>
<idno type="wicri:Area/Main/Merge">001033</idno>
<idno type="wicri:Area/Main/Curation">001016</idno>
<idno type="wicri:Area/Main/Exploration">001016</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Rule-based Search in Text Databases with Nonstandard Orthography</title>
<author>
<name sortKey="Pilz, Thomas" sort="Pilz, Thomas" uniqKey="Pilz T" first="Thomas" last="Pilz">Thomas Pilz</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen</wicri:regionArea>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen</wicri:regionArea>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Fuhr, Norbert" sort="Fuhr, Norbert" uniqKey="Fuhr N" first="Norbert" last="Fuhr">Norbert Fuhr</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen</wicri:regionArea>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of Computer Science and Interactive Systems, University of Duisburg-Essen</wicri:regionArea>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
</affiliation>
</author>
<author wicri:is="90%">
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute of German Language and Literature Studies, University of Duisburg-Essen</wicri:regionArea>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
<wicri:noRegion>University of Duisburg-Essen</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Correspondence: T. Pilz, Institute of Computer Science and Interactive systems, University of Duisburg-Essen, D-47048, Duisburg, Lotharstr. 56</wicri:regionArea>
<wicri:noRegion>Lotharstr. 56</wicri:noRegion>
<wicri:noRegion>Lotharstr. 56</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Literary and Linguistic Computing</title>
<title level="j" type="abbrev">Lit Linguist Computing</title>
<idno type="ISSN">0268-1145</idno>
<idno type="eISSN">1477-4615</idno>
<imprint>
<publisher>Oxford University Press</publisher>
<date type="published" when="2006-06">2006-06</date>
<biblScope unit="volume">21</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="179">179</biblScope>
<biblScope unit="page" to="186">186</biblScope>
</imprint>
<idno type="ISSN">0268-1145</idno>
</series>
<idno type="istex">623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65</idno>
<idno type="DOI">10.1093/llc/fql020</idno>
<idno type="local">fql020</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0268-1145</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this article, we describe our interdisciplinary project ‘Rule-based search in text databases with nonstandard orthography (RSNSR)’ in support of the conservation of cultural heritage, especially for the German reception of the philosopher Nietzsche. We present a rule-based fuzzy search engine that allows users to retrieve text data independently of its orthographical realization. The rules used are derived from statistical analyses, historical publications, linguistic principles, and expert knowledge. Our Web-based tool is intended for experts as well as interested amateurs. Along with its present features, further functions are currently worked out. Among them are automatic rule derivation and finer result classification through a generalized Levenshtein similarity measure. Our work is associated with the recently launched project Deutsch Diachron Digital (DDD) to build a complete diachronic corpus of German for the first time with texts from the ninth century (Old High German) to the present (Modern German).</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
</list>
<tree>
<country name="Allemagne">
<noRegion>
<name sortKey="Pilz, Thomas" sort="Pilz, Thomas" uniqKey="Pilz T" first="Thomas" last="Pilz">Thomas Pilz</name>
</noRegion>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<name sortKey="Ammon, Ulrich" sort="Ammon, Ulrich" uniqKey="Ammon U" first="Ulrich" last="Ammon">Ulrich Ammon</name>
<name sortKey="Fuhr, Norbert" sort="Fuhr, Norbert" uniqKey="Fuhr N" first="Norbert" last="Fuhr">Norbert Fuhr</name>
<name sortKey="Luther, Wolfram" sort="Luther, Wolfram" uniqKey="Luther W" first="Wolfram" last="Luther">Wolfram Luther</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001016 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001016 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:623F58E5380837CAE5BCFF79ED7CFBB2F09DAA65
   |texte=   Rule-based Search in Text Databases with Nonstandard Orthography
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024